Cleanup race condition in daemon reports #1402
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In the case where prterun is operating on a node
with a different topology than the other nodes
AND daemon rank=1 is delayed in sending its callback
message such that one or more other daemons report
first, then we segfault as:
the first daemon to report records its signature
and immediately is requested to return its topo
subsequent daemons with the SAME signature attempt
to use the NULL topo from the topologies array to
define their available CPUs
Resolve this by caching any daemons that report prior
to rank=1 so that we can compare their topo to that one.
Signed-off-by: Ralph Castain rhc@pmix.org
(cherry picked from commit fc83ca4)